Search CORE

16 research outputs found

The structure of verbal sequences analyzed with unsupervised learning techniques

Author: Bennani Younès
Recanati Catherine
Rogovschi Nicoleta
Publication venue
Publication date: 01/10/2007
Field of study

Data mining allows the exploration of sequences of phenomena, whereas one usually tends to focus on isolated phenomena or on the relation between two phenomena. It offers invaluable tools for theoretical analyses and exploration of the structure of sentences, texts, dialogues, and speech. We report here the results of an attempt at using it for inspecting sequences of verbs from French accounts of road accidents. This analysis comes from an original approach of unsupervised training allowing the discovery of the structure of sequential data. The entries of the analyzer were only made of the verbs appearing in the sentences. It provided a classification of the links between two successive verbs into four distinct clusters, allowing thus text segmentation. We give here an interpretation of these clusters by applying a statistical analysis to independent semantic annotations

arXiv.org e-Print Archive

HAL-Paris 13

Model-based Co-clustering for High Dimensional Sparse Data

Author: Aghiles Salah
Mohamed Nadif
Nicoleta Rogovschi
Publication venue
Publication date: 05/03/2020
Field of study

Abstract We propose a novel model based on the von Mises-Fisher (vMF) distribution for coclustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the maximum likelihood (ML) approach and the classification ML (CML) approach, we derive two novel, hard and soft, co-clustering algorithms. Empirical results on numerous synthetic and real-world text datasets, demonstrate the effectiveness of our approach, for modelling high dimensional sparse data and co-clustering. Furthermore, thanks to our formulation, that performs an implicitly adaptive dimensionality reduction at each stage, our model alleviates the problem of high concentration parameters kappa's, a well known difficulty in the classical vMF-based models

CiteSeerX

Enchaînements verbaux - étude sur le temps et l'aspect utilisant des techniques d'apprentissage non supervisé

Author: Recanati Catherine
Rogovschi Nicoleta
Publication venue: IRIT Press
Publication date: 01/06/2007
Field of study

10 pagesNational audienceUnsupervised learning allows the discovery of initially unknown categories. Current techniques make it possible to explore sequences of phenomena whereas one tends to focus on the analysis of isolated phenomena or on the relation between two phenomena. They offer thus invaluable tools for the analysis of sequential data, and in particular, for the discovery of textual structures. We report here the results of a first attempt at using them for inspecting sequences of verbs coming from sentences of French accounts of road accidents. Verbs were encoded as pairs (cat, tense) – where cat is the aspectual category of a verb, and tense its grammatical tense. The analysis, based on an original approach, provided a classification of the links between two successive verbs into four distinct groups (clusters) allowing texts segmentation. We give here an interpretation of these clusters by using statistics on semantic annotations independent of the training process

HAL-Paris 13

Hybrid Unsupervised Learning to Uncover Discourse Structure

Author: Bennani Younès
Recanati Catherine
Rogovschi Nicoleta
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/10/2007
Field of study

volume of the best papers of LTC'07International audienceData mining allows the exploration of sequences of phenomena, whereas one usually tends to focus on isolated phenomena or on the relation between two phenomena. It offers invaluable tools for theoretical analyses and exploration of the structure of sentences, texts, dialogues, and speech. We report here the results of an attempt at using it for inspecting sequences of verbs from French accounts of road accidents. This analysis comes from an original approach of unsupervised training allowing the discovery of the structure of sequential data. The entries of the analyzer were only made of the verbs appearing in the sentences. It provided a classification of the links between two successive verbs into four distinct clusters, allowing thus text segmentation. We give here an interpretation of these clusters by comparing the statistical distribution of independent semantic annotations

HAL-Paris 13

Stochastic Co-clustering for Document-Term Data

Author: Nadif Mohamed
Nicoleta Rogovschi
SALAH Aghiles
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 05/05/2016
Field of study

International audienceCo-clustering is more useful than one-sided clustering when dealing with high dimensional sparse data. We propose to address the aim of document clustering with a generative model-based co-clustering approach. To this end, we rely on a particular mixture of von Mises-Fisher distributions and propose a new parsimonious model allowing to reveal a block diagonal structure as well as a good partitioning of documents and terms. Then, by setting the estimate of the model parameters under the maximum likelihood (ML) approach, we derive three novel co-clustering algorithms: a soft one and two stochastic variants. Empirical results on numerous simulated and real-world datasets, demonstrate the advantages of our approach to model and co-cluster high dimensional sparse data

HAL Descartes

An Efficient Incremental Collaborative Filtering System

Author: Aghiles Salah
Mohamed Nadif
Nicoleta Rogovschi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

International audienceCollaborative filtering (CF) systems aim at recommending a set of personalized items for an active user, according to the preferences of other similar users. Many methods have been developed and some, such those based on Similarity and Matrix Factorization (MF) can achieve very good recommendation accuracy, but unfortunately they are computationally prohibitive. Thus, applying such approaches to real-world applications in which available information evolves frequently, is a non-trivial task. To address this problem, we propose a novel efficient incremental CF system, based on a weighted clustering approach. Our system is able to provide a high quality of recommendations with a very low computation cost. Experimental results on several real-world datasets, confirm the efficiency and the effectiveness of our method by demonstrating that it is significantly better than existing incremental CF methods in terms of both scalability and recommendation quality

HAL Descartes

Hal-Diderot

Sequencing of verbs - a study on tense and aspect using unsupervised learning

Author: Bennani Younès
Recanati Catherine
Rogovschi Nicoleta
Publication venue: Institute for Parallel Processing, Bulgarian Academy of Sciences
Publication date: 01/09/2007
Field of study

International audienceWe report here the results of an attempt at using data mining tools for inspecting sequences of verbs from French accounts of road accidents. This analysis comes from an original approach of unsupervised learning allowing the discovery of the structure of sequential data. The entries of the analyzer were only made for the verbs appearing in the sentences. It provided a classification of the linking between two successive verbs into four distinct clusters, allowing thus text segmentation. We give here an interpretation of these clusters by applying a statistical analysis to independent semantic annotations

HAL-Paris 13

Apprentissage neuro-markovien pour la classification non supervisée de données structurées en séquences

Author: Bennani Younès
Recanati Catherine
Rogovschi Nicoleta
Publication venue: Cépaduès Editions
Publication date: 01/01/2007
Field of study

International audienc

HAL-Paris 13

Semantic Type Detection in Tabular Data via Machine Learning Using Semi-synthetic Data

Author: Boufarès Faouzi
Chevallier Marc
Grozavu Nistor
Rogovschi Nicoleta
Publication venue: Springer Nature Switzerland
Publication date: 15/12/2022
Field of study

HAL-Paris 13